For ONVIF TTS audio proposal, to support device with TTS function by Peggy0422 · Pull Request #694 · onvif/specs

Peggy0422 · 2025-12-03T10:54:56Z

To support audio product with TTS function, several operation should be done:

Added TTSCapabilities(Optional): indicate whether the device is capable of TTS function and its corresponding TTS configuration. So add complex type "TTSCapabilities" to the existing complex type "AudioClipCapabilities".
Parameter:

MaxContentLength: indicates the max length of content of a text for device to convert to an audio clip;
TTSLanguage: indicates what language(s) the device supports for TTS function.
TTSVoiceType: indicates voice types that device supports for TTS function.

Add “AddTTSAudioClip”and "AddTTSAudioClipResponse": to send a text, TTS configuration and audio clip configuration to device, device could convert the text to an audio clip based on TTS Configuration. Subsequently, the device will play this audio clip based on configuration.
Parameter:

Token(Optional): token for the audio clip.
Configuration: audio clip configuration to add, see element "Configuration" .
TTSConfiguration: for TTS audio clip, it specifies the audio content, language and voice type when device play this audio clip.
Reponse:
Token: unique token of the TTS audio clip to be uploaded.

media2.wsdl

Updated complexType "AudioClipCapabilities" with element "TTSCapabilities"; added complexType "TTSCapabilities" with attributes "MaxContentLength", "TTSLanguage" and "TTSVoiceType"; added simpleType "TTSLanguage" and "TTSVoiceType".
Added elements "AddTTSAudioClip" and "AddTTSAudioClipResponse" for sending a text, TTS configuration and audio clip configuration to the device.
Added complexType "TTSAudio" for element "TTSConfiguration". It includes parameters such as Content, Language, VoiceType.
Added "AddTTSAudioClipRequest" and "AddTTSAudioClipResponse"

media2.xml and documentation

Added detail descriptions for AddTTSAudioClip operations, explaining their purpose, parameters, and responses.
Updated audio clip capabilities with TTSCapabilities.

1. Added AddTTSAudioClip request and AddTTSAudioClip response for sending a text and its TTS configuration to the device（1621-1652）（2036-2041）（2418-2422）（2935-2943）. 2. Added complex types "TTS Audio" （1465-1485）for TTSConfiguration to support TTS function. It includes parameters Content, Language, VoiceType. 3. updated AudioClipCapabilities with TTSCapabilities（177-181）, and added complex types for TTSCapabilities（201-220）to indicate the device supports TTS function and its corresponding configuration. complex types TTSCapabilities includes MaxContentLength, TTSLanguage and TTSVoiceType. 4. Added simpleType TTSLanguage（220-231） and TTSVoiceType（232-238）.

1. Added detailed descriptions for AddTTSAudioClip operations, explaining their purpose, parameters, and responses.（2359-2416） 2. updated audio clip Capabilities with TTSCapabilities.（2698-2700）

update code line information for TTS function

correct some editorial errors

Updated the description of the AddTTSAudioClip operation to clarify the parameters and response. Updated the description of TTScapabilities.

TTS audio clip pull request was firstly created as number 668

Updated TTS configuration description and added TTSCapabilities entry.

sujithhanwha · 2025-12-04T12:34:32Z

OLD PR for reference
#668

venki5685 · 2025-12-09T07:14:24Z

doc/Media2.xml

+          </varlistentry>
+        </variablelist>
+        <para></para>
+        <para><emphasis role="bold">Note:</emphasis> Audio clip uploads to the device can fail in the following scenarios, and a specific HTTP error code should be returned to the client when an upload fails.</para>


this note seems not applicable for TTSAudioClip

Yes, it is not for TTS, I will delete it.

delete inappropriate note for OPTION AddTTSAudioClip

johado

Some small textual comments.

johado · 2025-12-16T11:24:51Z

doc/Media2.xml

+        <title>AddTTSAudioClip</title>
+        <para>This operation adds a text, audio clip configuration and TTS configuration to the device, for device converting the text to an audio clip based on the TTS configuration. 
+			The response to the command includes a unique token for this converted audio clip. 
+			If the device is unable to support language specified in the TTS configuration, the associated configuration will deleted from the device.</para>


add "be" to "will be deleted"

Okay, got it.

johado · 2025-12-16T11:25:39Z

doc/Media2.xml

+            <term>response</term>
+            <listitem>
+              <para role="param">Token - [tt:ReferenceToken]</para>
+              <para role="text">Unique token of the TTS audio clip to be uploaded.</para>             


Change "to be uploaded" to "that was added" ?

Thank you very much for your advise, we consider using the word "assign", which should be more precise.

johado · 2025-12-16T11:26:51Z

doc/Media2.xml

+        </varlistentry>
+		<varlistentry>
+          <term>TTSCapabilities</term>
+          <listitem><para>Indicates device supports TTS function and TTS configuration.See tr2: TTSCapabilities.</para></listitem>


Add space after .: "..configuration. See tr2:..."

Okay, thank you.

johado · 2025-12-16T11:32:09Z

wsdl/ver20/media/wsdl/media.wsdl

+                 </xs:element>
+                 <xs:element name="Language" type="xs:string">
+                     <xs:annotation>
+                         <xs:documentation>Language for the TTS audio clip playback. See tr2: TTSLanguage. </xs:documentation>


Change to "See tr2:TTSLanguage and TTSCapabilities." ?

Thank you for your option. TTSLanguage is an attribute within TTSCapability already. If we want to point out that the language for TTS audio clip playback must be one of the languages that supported by the device, we could consider revise the explanation to clearly indicate this, such as: "The language which is supported and used for TTS audio clip playback. "

johado · 2025-12-16T11:32:50Z

wsdl/ver20/media/wsdl/media.wsdl

+                 </xs:element>
+                 <xs:element name="VoiceType" type="xs:string">
+                     <xs:annotation>
+                         <xs:documentation>The voice type for the TTS audio clip playback. See tr2: TTSVoiceType.</xs:documentation>


Change to "See tr2:TTSVoiceType and TTSCapabilities." ?

I propose to update the explanation for TTSVoiceType, just like commit for TTSLanguage

robberos · 2025-12-16T12:04:07Z

wsdl/ver20/media/wsdl/media.wsdl

+					<xs:sequence>
+						<xs:element name="Token" type="tt:ReferenceToken">						
+							<xs:annotation>
+								<xs:documentation>Unique token of the TTS audio clip to be uploaded.</xs:documentation>


change "to be uploaded" to something more relevant. converted, generated, ..?

Thank you very much for bring it up, yes, we consider changing it and using the word "assign", which should be more precise.

robberos · 2025-12-16T13:02:51Z

wsdl/ver20/media/wsdl/media.wsdl

+                <xs:anyAttribute processContents="lax"/>
+            </xs:complexType>
+            <!--===============TTS Language================-->
+            <xs:simpleType name="TTSLanguage">


What is reasoning behind decision of languages in below list?

Is there any standard for offical language names that can be refered to?

TTSCapabilities and TTSAudio uses open strings, so enum should provide a good pattern.

https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes ?

Thank you so much for your comments! We truly appreciate your input and have been carefully considering how to best define these general concepts. Your mention of ISO international standards was particularly helpful and guided our further research. We also looked into RFC 5646 for language representation across countries. So we would like to use alpha-2 codes to represent languages and countries, as recommended in ISO 639-1 and ISO 3166-1. For languages with regional variations, we plan to adopt the language-country format (e.g., en-US, zh-CN). Thank you again for your feedback.

robberos · 2025-12-16T15:24:43Z

doc/Media2.xml

        </itemizedlist>              
-      </section>    
+      </section>  
+	  <section xml:id="section_wvd_dzg_rye">


id should be unique in xml, right? seems as it is a copy of SetAudioClip section below

Yes, thank you for the suggestion. I have revised it accordingly.

update description for TTSLanguage and TTSVoiceType

Update documentation for Token element for AddTTSAudioClip response

Updated TTSLanguage type to include ISO language and country codes with documentation.

ocampana-videotec · 2026-02-03T08:50:10Z

wsdl/ver20/media/wsdl/media.wsdl

+                       See <a href="https://www.iso.org/obp/ui/">ISO Country Codes</a>.
+                  </xs:documentation>
+              </xs:annotation>
+                 <xs:restriction base="xs:string">


Do we really need to make an explicit restriction here and not just defined it as a string? If we go this way, whenever we need to add a language we need to update the WSDL file.

Thank you very much for your comment! Yes, this is an important issue we should considered.
Previously, we defined languages using string format and listed commonly used or potentially needed languages. However, this approach does introduce a maintenance burden—as you pointed out, each new language addition would require updating the WSDL file.To address this, we now directly reference ISO-standard language codes via strings. Users may refer to the official ISO codes for specific needs, while the WSDL only defines the reference rules. The examples in TTSLanguage are provided for convenience. I hope this clarifies the approach. Thank you again for your comment!

Added note about enumeration values being illustrative in TTSLanguage.

Revise the description of language definition in TTScapability and TTSAudio

kieran242

A couple of requested changes

kieran242 · 2026-03-31T13:52:58Z

wsdl/ver20/media/wsdl/media.wsdl

+                    <xs:annotation>
+                        <xs:documentation> 
+                            List of supported languages. Uses ISO 639-1 alpha-2 language codes, such as"en" for English. See <a href="https://www.loc.gov/standards/iso639-2/php/English_list.php">Codes for the Representation of Names of Languages</a>.
+                            Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166 Country Codes</a>.


The link supplied "https://www.iso.org/obp/ui/" to reference ISO 3166-1 does not direct you to the standard instead it takes you to the following page: Can we fix this reference please :)

Sure, I'll replace the link with the direct reference immediately (https://www.iso.org/obp/ui/#search/code/). Thank you for pointing this out.

kieran242 · 2026-03-31T14:01:28Z

wsdl/ver20/media/wsdl/media.wsdl

+                         <xs:documentation>
+                             The language that is supported by the device and used for TTS audio clip playback. 
+                             Uses ISO 639-1 alpha-2 language codes for definition, such as"en" for English. See <a href="https://www.loc.gov/standards/iso639-2/php/English_list.php">Codes for the Representation of Names of Languages</a>.
+                             Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166 Country Codes</a>.


As per my previous 2 comments above. Please correct here also.

Understood, I have already corrected this section as well. Thank you for the reminder.

kieran242 · 2026-03-31T14:02:16Z

wsdl/ver20/media/wsdl/media.wsdl

+                    <xs:annotation>
+                        <xs:documentation> 
+                            List of supported languages. Uses ISO 639-1 alpha-2 language codes, such as"en" for English. See <a href="https://www.loc.gov/standards/iso639-2/php/English_list.php">Codes for the Representation of Names of Languages</a>.
+                            Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166 Country Codes</a>.


Suggested change

Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166 Country Codes</a>.

Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166-1 Country Codes</a>.

Yes, that makes it more accurate. I've updated the relevant section. Thank you for your advice! :)

kieran242 · 2026-03-31T15:14:50Z

@Peggy0422 When you use the "AddTTSAudioClip" api is the "TTSConfiguration" stored on the device with the Audio Clip or just used to create the audio clip dynamically? If stored on the device then there is no way to update it or identify it.

Further when you request "GetAudioClips" there does not seem to be a way to identify which is an uploaded Audio clip and a TTS Audio Clip other than the Audio Clip token returned to the user from the API. This would make updating or deleting an Audio TTS Clip difficult without keeping a track of the tokens and your "TTSConfiguration" in some way.

Peggy0422 · 2026-04-01T07:55:28Z

@kieran242 Thank you very much for your questions. Regarding the "AddTTSAudioClip" API, the "TTSConfiguration" is used solely for generating the audio clip and is not stored on the device.

Typically, adding a TTS audio clip is the first step to enable playback on the device. When a client uses AddTTSAudioClip, the device returns a token via the AddTTSAudioClipResponse that corresponds to the generated TTS audio clip. This token serves as a unique identifier for subsequent operations, such as Get, Set or Delete.
Hope this addresses your concerns, thank you.

update the reference link for country code

Peggy0422 added 9 commits November 10, 2025 10:50

Update Media2.xml

d2607c7

1. Added detailed descriptions for AddTTSAudioClip operations, explaining their purpose, parameters, and responses.（2359-2416） 2. updated audio clip Capabilities with TTSCapabilities.（2698-2700）

Update media.wsdl

043366e

update code line information for TTS function

Update media.wsdl

43f83bf

correct some editorial errors

Merge branch 'onvif:video/TTS-audio-clip' into video/TTS-audio-clip

46b11bb

Update Media2.xml

ea5b5dd

Updated the description of the AddTTSAudioClip operation to clarify the parameters and response. Updated the description of TTScapabilities.

Merge pull request #692 from Peggy0422/video/TTS-audio-clip

823e174

TTS audio clip pull request was firstly created as number 668

Revise TTS configuration text and add capabilities entry

65f8b9c

Updated TTS configuration description and added TTSCapabilities entry.

Update documentation for TTS attributes and elements

7b61dc7

ocampana-videotec mentioned this pull request Dec 4, 2025

For ONVIF TTS audio proposal, to support device with TTS function #692

Merged

ocampana-videotec added the WG_enh label Dec 4, 2025

ocampana-videotec mentioned this pull request Dec 4, 2025

video/tts audio clip #668

Closed

ocampana-videotec added 26.06 IPR needed labels Dec 4, 2025

ocampana-videotec added this to the 26.06 milestone Dec 4, 2025

venki5685 reviewed Dec 9, 2025

View reviewed changes

Update Media2.xml

e4703a2

delete inappropriate note for OPTION AddTTSAudioClip

johado reviewed Dec 16, 2025

View reviewed changes

robberos reviewed Dec 16, 2025

View reviewed changes

Peggy0422 added 6 commits January 5, 2026 10:29

Fix typos in Media2.xml documentation

61ff3ab

Rename section from 'section_wvd_dzg_rye' to 'section_AddTTSAudioClip'

a9f7585

Update media.wsdl

baa4502

update description for TTSLanguage and TTSVoiceType

Update description of response token in AddTTSAudioClip in Media2.xml

79d31e3

Update documentation for Token element in media.wsdl

86e4eaa

Update documentation for Token element for AddTTSAudioClip response

Revise TTSLanguage type with ISO codes and explanation

226c053

Updated TTSLanguage type to include ISO language and country codes with documentation.

ocampana-videotec reviewed Feb 3, 2026

View reviewed changes

Update documentation in media.wsdl

27eeddc

Added note about enumeration values being illustrative in TTSLanguage.

Peggy0422 added 2 commits March 4, 2026 15:46

Update media.wsdl

e835572

Revise the description of language definition in TTScapability and TTSAudio

Update media.wsdl

2a71cbd

kieran242 requested changes Mar 31, 2026

View reviewed changes

Peggy0422 added 2 commits April 2, 2026 14:00

Update media.wsdl

3d8c89a

update the reference link for country code

Update media.wsdl

088525a

	Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166 Country Codes</a>.
	Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166-1 Country Codes</a>.

Conversation

Peggy0422 commented Dec 3, 2025

Uh oh!

sujithhanwha commented Dec 4, 2025

Uh oh!

venki5685 Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johado left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robberos Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robberos Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kieran242 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kieran242 commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Peggy0422 commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

venki5685 Dec 9, 2025 •

edited

Loading

robberos Dec 16, 2025 •

edited

Loading

robberos Dec 16, 2025 •

edited

Loading

kieran242 commented Mar 31, 2026 •

edited

Loading